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Abstract. We develop a general framework for the specification and 
implementation of systems whose executions are words, or partial or- 
ders, over an infinite alphabet. As a model of an implementation, we in- 
troduce class register automata, a one-way automata model over words 
with multiple data values. Our model combines register automata and 
class memory automata. It has natural interpretations. In particular, it 
captures communicating automata with an unbounded number of pro- 
cesses, whose semantics can be described as a set of (dynamic) message 
sequence charts. On the specification side, we provide a local existential 
monadic second-order logic that does not impose any restriction on the 
number of variables. We study the realizability problem and show that 
every formula from that logic can be effectively, and in elementary time, 
translated into an equivalent class register automaton. 



1 Introduction 

A recent research stream, motivated by models from XML database theory, 
considers data words, i.e., strings over an infinite alphabet [2UTD1IT4, 21, 23J. The 
alphabet is the cartesian product of a finite supply of labels and an infinite supply 
of data values. While labels may represent, e.g., an XML tag or reveal the type of 
an action that a system performs, data values can be used to model time stamps 
[TUIfTTirn)] , process identifiers [fJll2"5]. or text contents in XML documents [2]. 

We will consider data words as behavioral models of concurrent systems. In 
this regard, it is natural to look at suitable logics and automata. Logical formulas 
may serve as specifications, and automata as system models or tools for deciding 
logical theories. This viewpoint raises the following classical problems/ tasks: 
satisfiability (does a given logical formula have a model?), model checking (do 
all executions of an automaton satisfy a given formula?), and realizability (given 
a formula, construct a system model in terms of an automaton whose executions 
are precisely the models of the formula) . Much work has indeed gone into defining 
logics and automata for data words, with a focus on satisfiability [51113] . 

One of the first logical approaches to data words is due to [10] . Since then, a 
two- variable logic has become a commonly accepted yardstick wrt. expressivity 
and decidability [5]. The logic contains a predicate to compare data values of 
two positions for equality. Its satisfiability problem is decidable, indeed, but sup- 
posedly of very high complexity. An elementary upper bound has been obtained 



only for weaker fragments [S1[T3]. For specification of communicating systems, 
however, two-variable logic is of limited use: it cannot express properties like 
"whenever a process Pidl spawns some Pid2, then this is followed by a message 
from Pid2 to Pidl". Actually, the logic was studied for words with only one 
data value at each each position, which is not enough to encode executions of 
message-passing systems. But three-variable logics as well as extensions to two 
data values lead to undecidability. To put it bluntly, any "interesting" logic for 
dynamic communicating systems has an undecidable satisfiability problem. 

Instead of satisfiability or model checking, we therefore consider realizabil- 
ity A system model that realizes a given formula can be considered correct by 
construction. Realizability questions for data words have, so far, been neglected. 
One reason may be that there is actually no automaton that could serve as a 
realistic system model. Though data words naturally reflect executions of sys- 
tems with an unbounded number of threads, existing automata fail to model 
distributed computation. Three features are minimum requirements for a suit- 
able system model. First, the automaton should be a one-way device, i.e., read 
an execution once, processing it "from left to right" (unlike data automata [5], 
class automata [3J, two-way register automata, and pebble automata |21|). Sec- 
ond, it should be non- deterministic (unlike alternating automata [14.21J). Third, 
it should reflect paradigms that are used in concurrent programming languages 
such as process creation and message passing. Two known models match the first 
two properties: register automata [T7jll8(l25] and class memory automata [2]; but 
they clearly do not fulfill the last requirement. 

Contribution. We provide an existential MSO logic over data words, denoted 
rEMSO, which does not impose any restriction on the number of variables. The 
logic is strictly more expressive than the two- variable logic from [5] and suitable 
to express interesting properties of dynamic communicating systems. 

We then define class register automata as a system model. They are a mix of 
register automata |17|18|25| and class memory automata [2]. A class register au- 
tomaton is a non-deterministic one-way device. Like a class memory automaton, 
it can access certain configurations in the past. However, we extend the notion 
of a configuration, which is no longer a simple state but composed of a state and 
some data values that are stored in registers. This is common in concurrent pro- 
gramming languages and can be interpreted as "read current state of a process" 
or "send process identity from one to another process". Moreover, it is in the 
spirit of communicating finite-state machines |12) or nested- word automata pQ, 
where more than one resource (state, channel, stack, etc.) can be accessed at a 
time. Actually, our automata run over directed acyclic graphs rather than words. 
To our knowledge, they are the first automata model of true concurrency that 
deals with structures over infinite alphabets. 

We study the realizability problem and show that, for every rEMSO formula, 
we can compute, in elementary time, an equivalent class register automaton. 
The effective translation is based on Hanf 's locality theorem [16] and properly 
generalizes [TIE] to a dynamic setting with unbounded process creation. 



2 



Outline. Sections [5] and [3] introduce data words and their logics. In Section 2J 
we define the new automata model. Section [5] is devoted to the realizability 
problem and states our main result. In Section [51 we give translations from 
automata back to logic. An extension of our main result to infinite data words 
is discussed in Section [7J We conclude in Section [5J 

2 Data Words 

Let IN = {0, 1,2,.. .} denote the set of natural numbers. For m € IN, we denote 
by [m] the set {1, . .. , m}. A boolean formula over a (possibly infinite) set A of 
atoms is a finite object generated by the grammar j3 ::= true \ false | a 6 A \ 
-i/3 | j3 V j3 | j3 A /3. For an assignment of truth values to elements of A, a boolean 
formula j3 is evaluated to true or false as usual. Its size |/3| is the number of 
vertices of its syntax tree. Moreover, |^4| G IN U {oo} denotes the size of a set 
A. The symbol = will be used to denote isomorphism of two structures. For a 
partial function /, the domain of / is denoted by dom(/). 

We fix an infinite set 23 of data values. Note that 23 can be any infinite 
set. For examples, however, we usually choose 25 = IN. In a data word, every 
position will carry m > data values. It will also carry a label from a non- 
empty finite alphabet S. Thus, a data word is a finite sequence over S x 23 m 
(over Xj if m — 0) . Given a data word w = (oi, d±) . . . (a n , d n ) with a, G £ and 
di = (d\, . . . , d™) 6 23 m , we let ^(i) refer to label a. t and d k (i) to data value 

Classical words without data come with natural relations on word positions 
such as the direct successor relation -<_|_i and its transitive closure <. In the 
context of data words with one data value (i.e., to = 1), it is natural to consider 
also a relation for successive positions with identical data values [S]. As, in 
the present paper, we deal with multiple data values, we generalize these notions 
in terms of a signature. A signature § is a pair (a, 3). It consists of a finite set a of 
binary relation symbols and an interpretation J. The latter associates, with every 
<J G a and every data word w = w\ . . . w n G (Ex 2D" 1 )*, a relation <J W C [n] x [n] 
such that the following hold, for all word positions i, j, i',f e [n]: 

(1) i <\ w j implies i < j 

(2) there is at most one k such that i <J W k 

(3) there is at most one k such that k <i w i 

(4) if i <J W j and i' <i w j' and Wi = uii> and Wj — uiji , then i < i' iff j < j' 

In other words, we require that <i w (1) complies with <, (2) has out-degree at 
most one, (3) has in-degree at most one, and (4) is monotone. Our translation 
from logic into automata will be symbolic and independent of 3, but its applica- 
bility and correctness rely upon the above conditions. However, several examples 
will demonstrate that the framework is quite flexible and allows us to capture 
existing logics and automata for data words. Note that < w can indeed be any 
relation satisfying (l)-(4). It could even assume an order on 23. 

As the interpretation 3 is mostly understood, we may identify S with a and 
write <J G S instead of <J G a, or |S| to denote \a\. If not stated otherwise, we 
let in the following § be any signature. 
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Example 1. Typical examples of relation symbols include -< + i and relating 
direct successors and, respectively, successive positions with the same fc-th data 
value: For w — wi . . . w n , we let -<+i — {(«, i + 1) | i £ {1, . . . ,n — 1}} and 
(-<^) MI = {(i, j) | 1 < i < j < n, d k (i) — d k (j). and there is no i < i' < j 
such that d k {i) = d k (i')}. When m = 1, we write instead of -<^. Automata 
and logic have been well studied in the presence of one single data value (m = 
1) and for signature Si^ ^ = {^+i , -<~} with the above interpretation [2j[5]. 
Here, and in the following, we adopt the convention that the upper index of a 
signature denotes the number m of data values. Figure [T] depicts a data word 
over U = {r, a} (request/acknowledgment) and T) — IN as well as the relations 
-< + i (straight arrows) and (curved arrows) imposed by ^. <} 

Example 2. We develop a framework for message-passing systems with dynamic 
process creation. Each process has a unique identifier from D = IN. Process c £ IN 
can execute an action f(c, d), which forks a new process with identity d. This 
action is eventually followed by n(d, c), indicating that d is new (created by c) and 
begins its execution. Processes can exchange messages. When c executes !(c, d), it 
sends a message through an unbounded first-in-first-out (FIFO) channel c — >• d. 
Process d may execute ?(d, c) to receive the message. Elements from I7dyn = 
{f , n , ! , ?} reveal the nature of an action, which requires two identities so that 
we choose m — 2. When a process performs an action, it should access the current 
state of (i) its own, (ii) the spawning process if a new-action is executed, and 
(iii) the sending process if a receive is executed (message contents are encoded in 
states). To this aim, we define a signature §j yn = {^ pro c , ^fork , ^msg} with the 
following interpretation. Assume w — w\ . . .w n £ (Sdyn x IN x IN)* and consider, 
for o, b £ Sdyn and i,j £ [n], the property 

*W)(i,j) - {t{i) = aAl{j) = bAd 1 {i)=d 2 {j) Ad 2 (i)=d 1 (j)). 

We set -<p" roc = (^^J™ , which relates successive positions with the same executing 
process. Moreover, let i ~<™ ork j iii < j, P(f. n )(i,j), and there is no i < k < j such 
that P(f, n )(i, fc) or P(f, n )(fe, j). Finally, we set i -<™ sg j if i < j, P (!)?) (i, j), and 

|{i'<i|P(, ?) (i',i)}| = |0" <j |P(!, ? )(i,j')}|. 

This models FIFO communication. An example data word is given in Figure [21 
which also depicts the relations induced by § 2 yn - Horizontal arrows reflect -< pro c, 
vertical arrows either -<fork or ^msg, depending on the labels. Note that n(2, 2) is 
executed by "root process" 2, which was not spawned by some other process. 
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Graph Abstraction. Note that the graph induced by the data word from 
Figure [5] does not resemble a word anymore, as the direct successor relation on 
word positions is abandoned. Actually, we can see data words from a different 
angle. A signature § determines a class of data graphs Q with (S x D m )-labeled 
nodes and S-labeled edges. A data graph is contained in Q if it can be "squeezed" 
into a word w such that nodes that are connected by a <-labeled edge turn 
into word positions that are related by <a w . In other words, we consider directed 
acyclic graphs such that at least one linearization (extension to a total order) 
matches the requirements imposed by the signature. 

Our principal proof technique relies on a graph abstraction of data words 
where data values are classified into equivalence classes. Let Part(m) be the 
set of all partitions of [m\. An §- graph is a (node- and edge-labeled) graph 
G = (V, (< G )<i e s, A, v). Here, V is the finite set of nodes, A : V — > S and 
v : V — »■ Part(m) are node-labeling functions, and each < G C V x V is a set 
of edges such that, for all i 6 V, there is at most one j € V with i <J G j, and 
there is at most one j € V with j <\ G i. We represent <\ G and (<\ G )~ 1 as partial 
functions and set next G (i) = j if i <J G j, and prev G (i) = j if j <J G i. 

Local graph patterns, so-called spheres, will also play a key role. For nodes 
i, j 6 V, we denote by dist the distance between i and j, i.e., the length of 

the shortest path from i to j in the undirected graph (V , U<gs <|G ^ ( <|G ) _1 ) 
(if such a path exists). In particular, dist G (i,i) = 0. For some radius B £ IN, 
the B-sphere of G around i, denoted by B-Sph G (i), is the substructure of G 
induced by {j £ V | dist G (i,j) < B}. In addition, it contains the distinguished 
element i as a constant, called sphere center. 

These notions naturally transfer to data words: With word w of length n, 
we associate the graph G(w) = ([n], (< w )<eS) A, v) where A maps i to l{i) and 
v maps i to {{I £ [m] \ d k (i) — d l (i)} | k £ [m]}. Thus, K 6 u{i) contains 
indices with the same data value at position i. Now, next™, prev™, dist w , and 
B-Sph w (i) are defined with reference to the graph G(w). We hereby assume that 
S is understood. We might also omit the index w if it is clear from the context. 

Data words u and v are called (§-) equivalent if G(u) = G(y). For a language 
L, we let [L]§ denote the set of words that are equivalent to some word in L. 

Given the data word w from Figure [TJ we have dist w (1,8) = 3. The picture 
on the right shows 1-Sph w (4) . The sphere center is framed by a ^ y<r \ 
rectangle; node labelings of the form {{1}} are omitted. r — {7} — > a — > a 

3 Logic 

We consider monadic second-order logic to specify properties of data words. Let 
us fix countably infinite supplies of first-order variables {x, y, . . .} and second- 
order variables {X, Y, . . .}. 

The set MSO(S) of monadic second-order formulas is given by the grammar 

ip ::= £(x) = a \ d k (x) = d l (y) \ x<y\x = y\ xGX \ -tip tp V ip \ 3x ip 3X if 
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where a G S, k,l G [m], < 6 S, x and y are first-order variables, and X is a 
second-order variable. The size \tp\ of ip is the number of nodes of its syntax tree. 

Important fragments of MSO(S) are FO(S), the set of first-order formulas, 
which do not use any second-order quantifier, and EMSO(S), the set of formulas 
of the form 3X 1 . . . 3X n ip with ip G FO(S). 

The models of a formula are data words. First-order variables are interpreted 
as word positions and second-order variables as sets of positions. Formula i(x) = 
a holds in data word w if position x carries an a, and formula d k (x) — d l (y) 
holds if the A:-th data value at position x equals the Z-th data value at position 
y. Moreover, x < y is satisfied if x < w y. The atomic formulas x = y and x £ X 
as well as quantification and boolean connectives are interpreted as usual. 

For realizability, we will actually consider a restricted, more "local" logic: let 
rMSO(S) denote the fragment of MSO(S) where we can only use d k (x) — d (x) 
instead of the more general d k (x) = d l (y). Thus, data values of distinct positions 
can only be compared via x < y. This implies that rMSO(S) cannot distinguish 
between words u and v such that G(u) = G(v). The fragments rFO(S) and 
rEMSO(S) of rMSO(S) are defined as expected. 

In the case of one data value (m = 1), we will also refer to the logic 
EMS0 2 (S+i,~ U {<}) that was considered in [5] and restricts EMSO logic to 
two first-order variables. The predicate < is interpreted as the strict linear or- 
der on word positions (strictly speaking, it is not part of a signature as we 
defined it). We shall later see that rEMSO(§^ 1 ^) is strictly more expressive 
than EMSC^S+i ^ U {<}), though the latter involves the non-local predicates 
d (x) = d l (y ) and <. This gain in expressiveness comes at the price of an unde- 
cidable satisfiability problem. 

A sentence is a formula without free variables. The language defined by 
sentence (p, i.e., the set of its models, is denoted by L(<p). By MSO(S), rMSO(S), 
rEMSO(S), etc., we refer to the corresponding language classes. 



Example 3. Think of a server that can receive requests (r) from an unbounded 
number of processes, and acknowledge (a) them. We let £ = {r, a}, D = M, 
and m = 1. A data value from D is used to model the process identity of the 
requesting and acknowledged process. We present three properties formulated in 
rFO(S 1 F1 ). Formula tpi = 3x3y (£(x) = r A £(y) = a A x ^ y) expresses that 
there is a request that is acknowledged. Dually, ip2 — Vx3y (£(x) — r — > l(y) = 
a A x y) says that every request is acknowledged before the same process 
sends another request. A last formula guarantees that two successive requests 
are acknowledged in the order they were received: 

/ £(x) = r A £(y) = r A x -< +1 y \ 

93 ~ ' y \^3x',y'(e(x') = a A £{y') = a A x x' -< +1 y' Ay y 1 ) ) 

This is not expressible in EMSC^S 1 ^ ^ U {<}). We will see that <p\, ip2, <^3 form 
a hierarchy of languages that correspond to different automata models, our new 
model capturing p 3 . 
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Example 4- We pursue Example [5] and consider -Sdyn with signature S^j . Recall 
that we wish to model systems where an unbounded number of processes com- 
municate via message-passing through unbounded FIFO channels. Obviously, 
not every data word represents an execution of such a system. Therefore, we 
identify some well formed data words, which have to satisfy ip\ A ip 2 A ip^ £ 
rFO(Sj yn ) given as follows. We require that there is exactly one root process: 
ip x = 3x (£(x) = n A d l {x) = d 2 {x) A Vy(d 1 (y) = d 2 (y) -> x = y)). Next, we 
assume that every fork is followed by a corresponding new-action, the first ac- 
tion of a process is a new-event, and every new process was forked by some other 
process: 

/ l(x) = f ->■ By (x -< fork y) \ 
<p 2 = Vx A £{x) = n <-> -By (y -< proc x) 

\M{x) = n -> (d 1 (x) = d 2 {x) V 3y{y ^ fork x)) ) 

Finally, every send should be followed by a receive, and a receive be preceded 
by a send action: 933 = Vx (l{x) G { ! , ? } — > 3y (x ^msg 2/ V y ^msg ^)) • This for- 
mula actually ensures that, for every c, <i € IN, there are as many symbols !(c, rf) 
as c), the iV-th send symbol being matched with the iV-th receive symbol. 
We call a data word over £d yn and Sj yn a message sequence chart (MSC, for 
short) if it satisfies ipx A tp 2 A 1^3. Figure [2] shows an MSC and the induced rela- 
tions. When we restrict to MSCs, our logic corresponds to that from [20]. Note 
that model checking rMSO(Sj yn ) specifications against fork-and-join grammars, 
which can generate infinite sets of MSCs, is decidable [20J. 

A last rFO(Sj yn )-formula (which is not satisfied by all MSCs) specifies that, 
whenever a process c forks some d, then this is followed by a message from d to 
c: Vxi,?/i (xi -<f ork yi -> 3x 2 ,y 2 (xi ^ pro c x 2 A y x < W oc 2/2 ^msg x 2 ))- 

4 Class Register Automata 

In this section, we define class register automata, a non-deterministic one-way au- 
tomata model that captures rEMSO logic. It combines register automata [17, 18J 
and class memory automata [2], When processing a data word, data values from 
the current position can be stored in registers. The automaton reads the data 
word from left to right but can look back on certain states and register contents 
from the past (e.g., at the last position that is executed by the same process). 
Positions that can be accessed in this way are determined by the signature §. 
Their register entries can be compared with one another, or with current values 
from the input. Moreover, when taking a transition, registers can be updated by 
either a current value, an old register entry, or a guessed value. 

Definition 1. A class register automaton (over signature §>) is a tuple A = 



(Q, J R,A(i ? <) <e §,^) where 

— Q is a finite set of states, 

— R is a finite set of registers, 
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— the C Q are sets of local final states, 

— <P is the global acceptance condition: a boolean formula over { l q < N J \ q G 
Q and N G IN}, and 

— A is a finite set of transitions of the form 



Here, p : § — Q is a partial mapping representing the source states. More- 
over, g is a guard, i.e., a boolean formula over { l 9i = 0% \ 61,62 £ [to] U 
(dom(p) x R)} to perform comparisons of values that are are currently read 
and those that are stored in registers. Finally, a G S is the current label, 
q G Q is the target state, and f : R — 1 (dom(p) X R) U ([to] X IN) is a partial 
mapping to update registers. 

In the following, we write p< instead of £>(<)■ Transition (p, g) — —> (q, f) 
can be executed at position i of a data word if the state at position prev < (i) is 
p<i (for all < G dom(p)) and, for a register guard (<h,ri) = (<2>^2)) the entry 
of register r\ at prev < (i) equals that of r2 at prev <!2 (i). The automaton then 
reads the label a together with a tuple of data values that also passes the test 
given by g, and goes to q. Moreover, register r obtains a new value according 
to f(r): if f(r) = (<,r f ) G dom(p) x R, then the new value of r is the value 
of r' at position prev <1 (i); if f(r) = (k,B) G [m] x IN, then r obtains any fc-th 
data value in the i?-sphere around i. In particular, /(r) = (fc, 0) assigns to r the 
(unique) k-th data value of the current position. To some extent, f(r) = (fc, B) 
calls an oracle to guess a data value. The guess is local and, therefore, weaker 
than [18] . where a non-deterministic reassignment allows one to write any data 
value into a register. This latter approach can indeed simulate our local version 
(this is not immediately clear, but can be shown using the sphere automaton 
from Section [5]). 

Let us be more precise. A configuration of A is a pair (q, p) where q G Q 
is the current state and p : R — 1 35 is a partial mapping denoting the current 
register contents. If p(r) is undefined, then there is no entry in r. Let w = 
ui\ . . . w n G (S x J) m )* be a data word and £ = (qi, pi) . . . (g n , p n ) be a sequence 
of configurations. For i G [n], k G [to], and B G IN, let = {d k (j) | j G [n] 

such that dist w (i,j) < B}. We call £ a rim of ^4 on w if, for every position 

t(i) 

i G [n], there is a transition (pi,gi) — > (qi, fi) such that the following hold: 

(1) dom(pi) = {< G S I prev < (i) is defined} 

(2) for all < G dom(^) : (p l ) < = <7 prev< (i) 

(3) <7i is evaluated to true on the basis of its atomic subformulas: 6\ = 62 is true iff 
vali(6\) = vali(62) G 35 where vali(k) = d (i) and vali((<i,r)) ~ p pre v < (j)( r ) 
(the latter might be undefined and, therefore, not be in 35) 



a 



(?,/)• 




Pi(r) = p pr ev < (i)(r') if fi(r) = (<,/) G dom(p) x R 
Pi(r) G 35|(i) if fi{r) = (fc, S) g [to] x IN 
Pi(r) undefined if fi(r) undefined 
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Run £ is accepting if qi € F< for all i S [n] and < 6 § such that next<|(«) is 
undefined. Moreover, we require that the global condition is met. Hereby, an 
atomic constraint q < N is satisfied by £ if |{ i 6 [n] | = q}\ < N. The language 
L{A) C (Z 1 x J) m )* of .4 is defined in the obvious manner. The corresponding 
language class is denoted by CKA(S). 

The acceptance conditions are inspired by Bjorklund and Schwentick [5] , who 
also distinguish between local and global acceptance. Local final states can be 
motivated as follows. When data values model process identities, a -<^-maximal 
position of a data word is the last position of some process and must give rise to 
a local final state. Moreover, in the context of Sj yn , a sending position that does 
not lead to a local final state in requires a matching receive event. Thus, 

local final states can be used to model "communication requests". The global 
acceptance condition of class register automata is more general than that of [2J 
to cope with all possible signatures. However, in the special case of Si^ ^, there 
is some global control in terms of -<+i. We could then perform some counting 
up to a finite threshold and restrict, like [2], to a set of global final states. 

We can classify many of the non-deterministic one-way models from the lit- 
erature (most of them defined for m = 1) in our unifying framework: 

— A class memory automaton [2] is a class register automaton where, in all 
transitions (p, g) — (?,/), the update function / is undefined everywhere. 
The corresponding language class is denoted by CMA(S). 

— As an intermediary subclass of class register automata, we consider non- 
guessing class register automata: for all transitions (p, g) — > (q, f) and 
registers r, one requires f(r) £ (dom(p) X R) U ([m] x {0}). We denote the 
corresponding language class by GRA~(S). 

— A register automaton |14pi7) is a non-guessing class register automaton over 
S+i = {^+i}- Moreover, non-guessing class register automata over S^i ^ 
capture fresh-register automata |25| . which can dynamically generate data 
values that do not occur in the history of a run. Actually, this feature is 
also present in dynamic communicating automata (6| and in class memory 
automata over ^ where a fresh data value is guaranteed by a transition 
(p, g) — > (q, f) such that p^^ is undefined. 

— Class register automata are a model of distributed computation: considered 
over Zdyn and S% , they subsume dynamic communicating automata [6]. In 
particular, they can handle unbounded process creation and message passing. 
Updates of the form f(r) — (^f rk>f') an d f(r) = {-<ms g ,r') correspond to 
receiving a process identity from the spawning/sending process. Moreover, 
when a process requests a message from the thread whose identity is stored 
in register r, a corresponding transition is guarded by (^ pro c, t) = (^msg, J"o) 
where we assume that every process keeps its identity in some register r$. 

Example 5. Let us give a concrete example. Suppose S = {r, a} and 33 = IN. 
We pursue Example [3] and build a non-guessing class register automaton A over 
for L = [{(r,l)---( r ,rc)(a,l)...(a,n) | n > l}^^. Roughly speaking, 



9 



Transitions 



Run 





sourc 

-<~ 


e (p) 

-<+l 


guard (g) 


input 


9 


update (/) 


1 








(r,d) 


9i 


n :— 


2 








(r,d) 


9i 


n :— 

r2 := (-<+i,n) 


3 


H 


9i 


H~,r 2 ) = -L 


(a,d) 


92 


ri :— 


4 


gi 


92 


(-J~,r 2 ) = (X+i,ri) 


(a,d) 


92 


n :— 



input 


state 


ri 


ri 


(r,8) 


9i 


8 


_L 


(r,5) 


9i 


5 


8 


(a, 8) 


92 


8 


_L 


(a, 5) 


92 


5 





Fig. 3. A non-guessing class register automaton over S +lj ^ and a run 



there is a request phase followed by an acknowledgment phase, and requests 
are acknowledged in the order they are received. Figure [3] presents A and an 
accepting run on (r, 8)(r, 5)(a, 8)(a, 5). The states of A are q\ and q 2 . State q± is 
assigned to request positions (first phase), state q2 to acknowledgments (second 
phase). Moreover, A is equipped with registers r\ and r-x. During the first phase, 
ri always contains the data value of the current position, and r 2 the data value 
of the -predecessor (unless we deal with the very first position, where r 2 is 
undefined, denoted _L). These invariants are ensured by transitions 1 and 2. In 
the second phase, by transition 3, position n+1 carries the same data value as the 
first position, which is the only request with undefined r 2 . Guard (-<^,f2) = -L 
is actually an abbreviation for ->((<^,r2) = (-<~,?"2))' By transition 4, position 
n + i with i > 2 has to match the request position whose ^-contents equals n 
at n + i - 1. Finally = {q 2 }, F^ +1 = {q 2 }, and <P = -.(^ < 0). 

For the language L from Example [SJ one can show L g" CMA(§^ 1 ^), using 
an easy pumping argument. Next, we will see that non-guessing class register 
automata, though more expressive than class memory automata, are not yet 
enough to capture rEMSO logic. Thus, dropping just one feature such as registers 
or guessing data values makes class register automata incomparable to the logic. 
Assume m — 2 and consider §^ = {-<^ , -<^} (cf. Example [1]). 

Lemma 1. rFO(S^) % CMA^(S^). 

Proof. We determine a formula ip G rFO(Sjt) and show, by contradiction, that 
every non-guessing class register automaton capturing L — L{ip) will necessarily 
accept a data word outside L. Roughly speaking, L consists of words where every 
position belongs to a pattern that is depicted in Figure|4]and captured by the for- 
mula pattern(xi, . . . , £4) = X\ -<^ X3 A X\ £4 A x 2 -<Z X3 A x 2 <\ X4. With 
this, if = \/x3x%, . . . , X4 (x E {xi, . . . , 24} A pattern(xi, . . . , x±)) E rFO(S^) is 
the formula for L. Suppose that there is a non-guessing class register automaton 
A over S^, recognizing L. We build a data word w = (a, d\) . . . (a, d n ) £ L with 
n G 4M and {d\, . . . , d^} (~l {d± , . . . , d^} — by nesting disjoint patterns as de- 
picted in Figure we first create i\, . . . , 14, then add j\, . . . , j'4; the next pattern 
is to be inserted at m, . . . , 114, etc. We assume that the data values of distinct 
patterns are disjoint. If we choose n large enough, then there are an accepting 
run £ = (qi,pi) ... {q n ,pn) of A on w (with transition U = (pi,<ft) /») at 



10 




Fig. 4. Nested patterns 




position i) and positions i±, . . . , 14, ji, . . . , j'4 of w such that ji < i\, i\, . . . and 
jl, . . . , j'4 form two (disjoint) patterns, and U ± = tj ± , . . . , i; 4 = t j 4 . Now, consider 
the data word w' that we obtain from w when we swap the second data values 
of positions i\ and j\. Thus, the data part of w' is 

d 1 ...d jl ^ 1 (d 1 J1 ,df 1 )d jl+1 ... di 1 -i(d\ 1 ,d 2 h )d il+ i...d n . 

We have the situation depicted in Figure [5j In particular, i\, . . . , £4 do not form 
a single closed cycle. This violates ip, as x = ii implies xi 6 {11,12}- Thus, 
w' L. However, applying transitions ti,...,t n still yields an accepting run 
£' = (qi, p[) . . . (q n , p' n ) of A on w' . For i G [n], is then given as follows: 



^ if Pt( r ) = d 2 n and i G 0'i,i 3 } 
if /9»(r) = d? and i G {11,13} 

d^ if /9j(r) = d^ and i = i 4 
<Pi{f) otherwise 



One can verify that £' is indeed an accepting run onw'. □ 

The proof of Lemma □ can be adapted to show rFO(S^ yn ) % CRA"(S^ yn ). It 
reveals that non-guessing class register automata can in general not detect cycles. 
However, this is needed to capture rFO logic [IB]. In Section^ we show that full 
class register automata capture rFO and, as they are closed under projection, 
also rEMSO logic. Closure under projection is meant in the following sense. Let 
r be a non-empty finite alphabet. Given S = (a, 3), we define another signature 
§r for data words over (£xf)x D m . Its set of relation symbols is {<Jr \ < £ §}• 
For w G {{£ x T) x £> m )*, we set i <f j iff i e(«0 j. Hereby, the projection 
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proj s just removes the F component while keeping U and the data values. For 
C G {CKA, CRA~, CMA}, we say that C(§) is closed under projection if, for 
every F and L C ({S xf)x X G C(S r ) implies proj S {L) G C(S). 

Lemma 2. For even/ signature S, CRA(S), CRA~(S), and CMA(S) are closed 
under union, intersection, and projection. They are, in general, not closed under 
complementation. 

Proof. Closure under union and intersection follows standard automata-theoretic 
constructions. Closure under projection holds since projection preserves the 
graph structure of a data word. For non-complementability, we can rely on the 
corresponding result for communicating automata [9|. Roughly speaking, a com- 
municating automaton is a dynamic communicating automaton with a fixed 
set of at least two processes Proc. It can be identified as a special case of our 
framework: We let m = 0, since the number of processes is fixed. Moreover, 
S = { !(c, d) , ?(c, d) | c, d € Proc such that c ^ d } is the set of actions. Finally, 
we define the signature Sp roc = {^ pro c , ^msg} as the straightforward restriction 
of §j yn (cf. Example [5]) to this bounded case. Speaking in terms of our frame- 
work, |9J indeed shows that class register automata (or, as m = 0, class memory 
automata) over Sp roc are not closed under complementation. □ 

5 Realizability of EMSO Specifications 

In this section, we solve the realizability problem for rEMSO specifications: 

Theorem 1. For all signatures §, rEMSO(S) C CRA(S). An automaton can be 
computed in elementary time and is of elementary size. 

Classical procedures that translate formulas into automata follow an induc- 
tive approach, use two-way mechanisms and tools such as pebbles, or rely on 
reductions to existing translations. There is no obvious way to apply any of 
these techniques to prove our theorem. 

We therefore follow a technique from [9J , which is based on ideas from [2"2l2"l] . 
We first transform the first-order kernel of the formula at hand into a normal 
form due to Hanf (TBJ. According to that normal form, satisfaction of a first- 
order formula wrt. data word w only depends on the spheres that occur in 
G(w), and on how often they occur, counted up to a threshold. The size of a 
sphere is bounded by a radius that depends on the formula. The threshold can 
be computed from the radius and |S|. We can indeed apply Hanf's Theorem, as 
the structures that we consider have bounded degree: every node/word position 
has at most |S| incoming and at most |S| outgoing edges. In a second step, we 
transform the formula in normal form into a class register automaton. 

Recall that B-Sph G (i) denotes the -B-sphere of graph/data word G around i 
(cf. Section^. Its size (number of nodes) is bounded by maxSize := (2|S| + 2) B . 
Let B-Spheres s = {B-Sph G {i) \ G = (V, . . .) is an S-graph and I G V}. We do 
not distinguish between isomorphic structures so that B-Spheres s is finite. 
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Theorem 2 (cf. [8, 16j). Let tp G rFO(S). One can compute, in elementary 
time, B G IN and a boolean formula j3 over { l S < TV' | S G B-Spheres s and 
N G IN} such that L(ip) is the set of data words that satisfy fj. Here, we say that 
w = W\ . . .w n satisfies atom S < N iff \{i G [n] \ B-Sph w (i) = S}\ < N. The 
radius B and the size of (3 and its constants N are elementary in \ip\ and |S|. 

Proof. A simple but crucial observation is that there exists a first-order sentence 
that is equivalent to tp but talks about G(w) rather than w. We simply write 
A(a;) = a instead of £(x) — a, and \f, l£ -p v{x) — n instead of d k (x) = d l (x) where 
V C Part(m) is the set of partitions of [to] such that k and I occur in the same 
set. As rMSO(S)-formulas cannot distinguish between data words that induce the 
same graph, the boolean formula j3 in normal form exists due to |16| . Actually, 
(3 can be computed in triply exponential time [5]. □ 

By Theorem [51 it will be useful to have a class register automaton that, 
when reading a position i of data word w, outputs the sphere of w around i. Its 
construction is actually the main difficulty in the proof of Theorem [TJ as spheres 
have to be computed "in one go", i.e., reading the word from left to right, while 
accessing only certain configurations from the past. 

Proposition 1. Let B G IN. One can compute, in elementary time, a class reg- 
ister automaton Ab = (Q, R, A, (i r <) <e s' true) over §, as well as a mapping 
7r : Q — > B-Spheresg such that L{Ab) = (S x J) m )* and, for every data word 
w = Wi . . . W n , every accepting run (qi, pi) . . . (q n , p n ) of Ab on w, and every 
i G [n], ir(qi) = B-Sph w (i) . Moreover, \Q\ and \R\ are elementary in B and |S|. 

The proposition is proved below. Let us first show how we can use it, together 
with Theorem[2] to translate an rEMSO formula into a class register automaton. 

Proof (of Theorem^. Let tp = 3X 1 . . . 3X n ijj G rEMSO(S) be a sentence with 
tp G rFO(§) (we also assume n > 1). Since Theorem [5] applies to first-order for- 
mulas only, we extend £ to £ x r where _T = 2^ 1, - - ,,l K Consider the extended 
signature Sr (cf. SectionH]). From ip, we obtain a formula ipr G rFO(Sr) by re- 
placing t(x) = a with V Mer i(x) = (a, M) and x G Xj with V a es Mer^( x ) 
; n . M U {j}). Consider the radius Bel and the normal form j3r for i/jr due to 
Theorem[5] Let Ab — (Q, R, A, (i r <) <e § r , true) be the class register automaton 
over §r from Proposition Q] and tt be the associated mapping. The global accep- 
tance condition of Ab is obtained from /3p by replacing every atom S < N with 
7r ~ 1 {S) < X (which can be expressed as a suitable boolean formula). We hold 
A' B , a class register automaton satisfying L(A' B ) — L(ipp)- Exploiting closure 
under projection (Lemma |2J), we obtain a class register automaton over § that 
recognizes L(<p) = proj s (L(ipr))- n 

The Sphere Automaton. In the remainder of this section, we construct the 
class register automaton Ab = {Q, R, A, (F<\) <e g, true) from Proposition [TJ 
together with 7r : Q — >• B-Spheresg. The idea is that, at each position i in the 
data word w at hand, Ab guesses the B-sphere S oi w around i. To verify that 
the guess is correct, i.e., S = B-Sph w (i), S is passed to each position that is 



13 



connected to i by an edge in G(w). That new position locally checks label and 
data equalities imposed by 5, then also forwards S to its neighbors, and so on. 
Thus, at any time, several local patterns have to be validated simultaneously 
so that a state q £ Q is actually a set of spheres. In fact, we consider extended 
spheres E — (S, a, col) where S — (U, (< £ )<| 6 s, A, v, 7) is a sphere (with universe 
U and sphere center 7), a £ U is the active node, and col is a color from a finite 
set, which will be specified later. The active node a indicates the current context, 
i.e., it corresponds to the position currently read. 

Let B-eSpheres s denote the set of extended spheres, which is finite up to 
isomorphism. For E = (S,a, col) £ B-eSpheres s , S = (U, (< £ ')<es, A, u, 7), and 
j £ U, we let E[j] refer to the extended sphere (S, j, col) where the active node a 
has been replaced with j. Now suppose that the state q of Ab that is reached after 
reading position i of data word w contains E = (S,a, col). Roughly speaking, 
this means that the neighborhood of i in w shall look like the neighborhood of 
a in S. Thus, if S contains j' such that a < E j' , then we must find i' such that 
i <l w i' in the data word. Local final states will guarantee that i' indeed exists. 
Moreover, the state assigned to i' in a run of Ab will contain the new proof 
obligation E[j'] and so forth. Similarly, an edge in (the graph of) w has to be 
present in spheres, unless it is beyond their scope, which is limited by B. All 
this is reflected below, in conditions T2-T6 of a transition. 

We are still facing two major difficulties. Several isomorphic spheres have 
to be verified simultaneously, i.e., a state must be allowed to include isomor- 
phic spheres in different contexts. A solution to this problem is provided by 
the additional coloring col. It makes sure that centers of overlapping isomorphic 
spheres with different colors refer to distinct nodes in the input word. To put 
it differently, for a given position i in data word w, there may be i' such that 

< dist w {i, i') < 2B + 1 and B-Sph w (i) = B-Sph w (i'). Fortunately, there cannot 
be more than (2|S| + 1) • maxSize 2 such positions. As a consequence, the coloring 
col can be restricted to the set {1, . . . , (2|S| + 1) • maxSize 2 + 1}. 

Implementing these ideas alone would do without registers and yield a class 
memory automaton. But this cannot work due to Lemma [TJ Indeed, a faithful 
simulation of cycles in spheres has to make use of data values. They need to be 
anticipated, stored in registers, and locally compared with current data values 
from the input word. We introduce a register (E, k) for every extended sphere 
E and k £ [m]. To get the idea behind this, consider a run (qi,pi) ■ ■ ■ {q n iPn) 
of Ab on w = (ai, di) ■ ■ ■ (a n , d n ). Pick a position i of w and suppose that 
E = (U, ( < 3" E )<igSj A, v, 7, a, col) £ qi. If a is minimal in E, then there is no 
pending requirement to check. Now, as a shall correspond to the current position 

1 ofw, we write, for every k £ [m], d\ into register (E, k) (first case of T8 below). 
For all j £ U\{a}, on the other hand, we anticipate data values and store them 
in (E[j],k) (also first case of T8). They will be forwarded (second case of T8) 
and checked later against both the guesses made at other minimal nodes of E 
(guard g 3 of T7) and the actual data values in w (guard g 2 ). This procedure 
makes sure that the values that we carry along within an accepting run agree 
with the actual data values of w. 
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Now, as prev™ and next™ are monotone wrt. positions with identical labels 
and data values, two isomorphic cycles cannot be "merged" into one larger one, 
unlike in non-guessing class register automata where different parts may act 
erroneously on the assumption of inconsistent data values (cf. Lemma [IJ . As a 
consequence, spheres are correctly simulated by the input word. 

Let us formalize Ab = (Q, R, A, (i r <) <g si true) an d the mapping n : Q — > 
B- Spheres §, following the above ideas. The set of registers is R = B-eSpheres s x 
[to]. A state from Q is a non-empty set q C B-eSpheres s such that 

(i) there is a unique E = (U, (< e )<ies, A, v, 7, a, col) G q such that 7 = a (we 
set ir(q) = (U, (< e )<ies, A, v, 7) to obtain the mapping required by Prop.[T]), 

(ii) there are a G E and 77 G Part{m) such that, for all E = (..., A, v, . . .) G q, 
we have A(a) = a and v(a) = rj (we let label{q) — a and data(q) = 77), and 

(hi) for every (S, a, col), (S, a', col) G q, we have a = a' . 

Before we turn to the transitions, we introduce some notation. Below, E will 
always denote (S,a, col) with S = (U, (< B )<es, A, v, 7); in particular, a refers 
to the active node of E. The mappings next^j, prev^j, and dist E are defined for 
extended spheres in the obvious manner. For j G U, we set type~(j) = {<] G S | 
prev^(j) is defined}. Let us fix, for all E G B-eSpheresg such that type~(a) ^ 0, 
some arbitrary <e G type~(a). Finally, for state g and fci,fc2 G [to], we write 
ki ^ q k-2 if there is K G data(q) such that {fci, fe} C X. 

We have a transition (p, g) — — )• (g, /) iff the following hold: 



ffl= A ki=k 2 A A -■ (fei = fe 2 ) 32= A k=(<,{E,k)) 




E[j] G P< 
£[?'] G q 
=>■ dist E {^,a) 
=>■ dist E {^,a) 



B 



B 



fci,fc 2 G[m] 



fci,fc 2 S[m] 

fel 7^9 fe 2 



fee[m] E £q 

<l£type~ (a) 



93 = 



A ( 



(<i,™,A)) = (<2,(^[j],A)) 



fee[m] Beij jec 

<l,<2Etyj>e _ (a) 



T8 for all k G [to] and E G B-eSpheres s : 




(k, dist E (j, a)) if 3j G U : E[j] G q and type- {j) = 
(< B[J] , (£, fc)) if 3jeU: E\j] G g and type'U) + 



otherwise 
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For every <d e S, the local acceptance condition is given by F<| = {q E Q | 
for all E E q, next^(a) is undefined}. Recall that the global one is true. 

As the maximal size of a sphere is exponential in B and polynomial in |S|, 
the numbers \Q\ and \R\ are elementary in B and |S|. Note that Ab can actually 
be constructed in elementary time. 

In the appendix, we show that the construction of Ab and n is correct in the 
sense of Proposition Q] 

6 Prom Automata to Logic 

Next, we give translations from automata back to logic. Note that rEMSO(S 1 ) _ 1 ) Q 
CMA(S]j_ 1 ), as rEMSO(§ 1 |_ 1 ) cannot reason about data values. However, we show 
that the behavior of a class register automaton is always MSO definable and, 
in a sense, "regular". There are natural finite-state automata that do not share 
this property: two-way register automata (even deterministic ones) over one- 
dimensional data words are incomparable to MSO(§^ x ^) |21j . 

Theorem 3. For every signature §, we have CRA(S) C MSO(S). 

Proof. As usual, second-order variables are used to encode an assignment of 
positions to transitions, which is then checked for being an accepting run. To 
simulate register contents, we extend a technique from [21 . Let us describe how 
a class register automaton A = (Q,R,A, (F < ) <eS ,^) over S is translated into 
an MSO(S)-sentence ip^ such that £(<#a) = L(A). Suppose B is the maximum 
of all B for which there is a transition (p,g) — — ^ (<?,/) E A with f(r) — (k,B), 
for some r and k. 

We assume a second-order variable Xg for every transition 6 E A. Moreover, 
we assume a variable B for each r E R, B E {1, ... ,23}, and each formula 
(3(x u ,x v ) E rFO(S), with free variables x u and x v , that is of the form 

f3(x Ul x v ) = 3xi, xb {x u Mi xi M 2 ... Mjj x B = af v ) 

where Mj E { = , < , | < E S}. The intuition of these variables is as follows. 
If a position x is contained in X& with S — (p,g) — [q,f) and f(r) = (k,B), 
then x will also be contained in some X^ B , meaning that x executes S and the 
new data value of r is the fc-th data value at the unique y such that /3(x,y) is 
satisfied. 

The formula tp A will be of the form 3(Xg) s 3(X^ B ) g (-01 A -02)- Here, ip± E 
rFO(§) checks whether the following hold: 

— each position x is contained in exactly one set Xg 

— for all x and r E R, x is contained in at most one set of the form X^ B 

— if x E Xg with S — (p,g) —t (q,f) and f(r) = (k,B), then x E B for 
some /? 

— the label at position x E Xg corresponds to the label of 5 

— conditions (1) and (2) in the definition of a run are met 
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— the (potential) run is accepting, i.e., and <P are respected 

It remains to define i/> 2 G MSO(S) to check property (3) of a run. This can 
be done by means of formulas tp g (x) , one for each atomic guard g G { 9i = 62 
81,62 G [m] U (S x i?)}. We restrict here to g = ((<J,r) = I) with I G [m]. The 
other cases are similar. Formula ipg( x ) checks if the contents of r at position 
prev^x) equals the Z-th data value at x. It will be of the form 3X 3(X r ) reR \g- 
The idea is that the positions in X describe a path x\ <h X2 <2 •■• <n-i 
x n < x that "transports" the data value d l {x). We suppose that every position 
Xi is contained in precisely one set X Ti meaning that register r,; is updated by 
the contents of Ti—\ at position Xi-\. More precisely, we require that, for all 
i G {2, . . . , n}, there is a transition S with register-update mapping / such that 
Xi G X$ and /(fj) = (<j_i, 7*i_i). The last update should concern r, i.e., we 
require x n G X r . So suppose 2^ G X ri . It remains to ensure that register 7*1, at 
xi, obtains the value d l (x). More precisely, there should be a transition S with 
update mapping /, as well as fc, -B, j3 and a position xo such that /3(xi, xo) holds, 
/(n) = (k,B), neljn and d fc (x ) = d'(x). 

Note that \g can be defined as an FO(S)-formula and ip g (x) holds iff the 
register contents of r at prev < (x) equals d l (x). □ 

In the proof, the non-local predicate d k (x) = d (y) is indeed essential to simulate 
register assignments, as we need to compare data values at positions where 
registers are updated. For one-dimensional data words, however, the predicate 
can be easily defined in rMSO(S^_ 1 ^). The following theorem is dedicated to 
this classical setting over §^_ x 

Theorem 4. We have the inclusions depicted in Figure Here, — > means 
'strictly included' and --■ » means 'included'. 



Proof. The inclusion rEMSO(S ] fl J C CRA(§^ 1 ^) is due to Theorem [TJ and 
CKA(S ] fl ,J C MSO(S 1 hl J is due to Theorem^' The equality MSO(S] j . 1 ^) = 
rMSO(§^' 1 J is obvious. ' 
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CMA(§^ ^) C r EMSO(S 1 F1 ^) : Consider a class memory automaton A. 
As „4 is completely state-based and does not make use of any register, it is 
standard to define a sentence ip £ rEMSOfS^ ^) such that L(ip) = L(A). It 
remains to show strictness of the inclusion^] Suppose S = {r, a} and D = IN, 
and let L = [{(r, 1) . . . (r, n)(a, 1) ... (a, n) \ n > ljjs^ ^ (note that the proof 
also works if S is a singleton). Towards a contradiction, suppose L is recog- 
nized by class memory automaton A. As A has no access to registers, a run 
of A on (r, 1) . . . (r, n)(a, 1) . . . (a, n) is actually a sequence of states qi . . . q 2n - If 
n is large enough, there are positions 1 < i < j < n such that qi = qj. Now, 
we can simply exchange the data values at positions i and j without affect- 
ing acceptance. More precisely, qi . . . qi n is also an accepting run on the data 
word (r, 1) . . . (r,i-l)(r, j)(r,M-l) . . . (r, j-l)(r,i)(r,j+l) . . . (r,n)(a, 1) . . . (a,n), 
which is not contained in L, a contradiction. On the other hand, L is the con- 
junction ipi A if2 of the following rFO(S ] fl ^-sentences: 

— i , . w / x -<„ y A £{x) = r A £{y) = a\ 

- (fii = 3xtrue A Vx 3— L y I „ „) ( , „) ( 

r \\/ y x A £(|/) = r A £(x) = a J 

I x < +1 y A -.(^(a;) = r A %) = a) 

- <P2 = Vx, y _^ , , / x x' -< + i y' A y -«<~ y' \ 

\ \Vi' -< + i y' <^ y A x' <^ x ) 

The first formula expresses that the word has positive length and each ~ equiv- 
alence class has size two. The second formula ensures the FIFO structure of a 
data word. 

CMA^^) g CRA"(§^ 1 t J) : Consider the language L from the previous 
paragraph. It is not in CMA(S^_ lrs J. However, Example [5] demonstrates that 
there is a non- guessing class register automaton recognizing L. 

rMSO(S^^) £ CMA(§^ 1 : We encode grids into data words. An 
grid is a graph that has a height i £ IN and a width j £ IN meaning that it has i 
rows and j columns that are connected by a horizontal and a vertical immediate 
successor relation. Nodes are labeled by elements from S = {a, b, c}. We encode 
an (i, j)-grid as the data word 

(an, 1) . . . (an,i)(ai2, 1) . . . (a i2 ,i) (ay, 1) . . . (a <3 -, i) 

where aki £ £ is the labeling of the grid node (fc, /). Hereby, each subword 
(aifc, 1) . . . (aik,i) constitutes a column. Then, moving down in the grid corre- 
sponds to a -< + i-step in the data word, moving right corresponds to a -<^-step. 
These steps are rFO(S^ 1 „J-definable. 

Consider the set £ of grids of the form H\.C.Hi where C is a single column 
of c-labeled nodes, and Hi and Hi are grids with labels from {a, b} such that 
the sets of different column words (over {a, b}) in Hi and H2 coincide. We know 

1 Note that satisfiability of rEMSO(S] ) _ 1 ^) is undecidable, whereas emptiness of class 
memory automata over S+i^ is decidable [5j. This already implies that there is no 
effective translation of automata into formulas. 
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that C is MSO-definable in the signature of a grid. Therefore, the encoding L of 
C into data words is rMSO(S ] f 1 ^-definable. Using an argument from [24] . we 
show that L ^ CRA(S ] fl ^). First observe that the number of distinct sets of 
columns words over {a, b} of length n is 2 2 . Suppose, towards a contradiction, 
that there is a class register automaton A = (Q,R,A,(F < ) <&S ,<1>) such that 
L(A) — L. Without loss of generality, we assume that 4> is given in terms of a 
simple set of global final states. In a run of A on the data-word encoding of grid 
H1.C.H2 of height n, all the information that A has about Hi must be encoded 
in the n configurations that are taken while reading the c-labeled positions. The 
number of tuples of n configurations that A can distinguish is bounded by 

N= |Q|"-2^-™) 2 • (n+l) |il| ' n . 

Here, the second factor is an upper bound on the number of equivalence classes 
on the set {1, . . . , | R\ ■ n}, which captures guessed values, and the third factor 
is the number of registers assignments. Now, as Q and R are fixed, N does not 
grow sufficiently fast so that A will accept a data word outside L. 

CKA(S 1 ^ 1 ) C CMA^(§^ 1 ) : Note first that class register automata over S^i 
are a variant of the register automata with non- deterministic reassignment from 
[18] , The crucial difference is that the "look-ahead" of CRA(§1 1 ) is bounded, 
while the automata from |18| can guess any arbitrary data value. As a conse- 
quence, the latter capture the set of data words such that all data values (except 
the last one) are different from the last data value. We will show that, on the 
other hand, class register automata over are no more expressive than clas- 
sical register automata, which cannot recognize that language. 

Let A — (Q,R,A,(F < ) <eS ,<P) be a class register automaton over §\ 1 . 
We sketch the construction of a non-guessing class register automaton A' = 
(Q',R',A\ (i%) <eS ,<£') over such that L(A) = L(A'). Let B be the maxi- 
mal value B such that an update of A is of the form f(r) — (k, B). Without loss 
of generality, we assume that B > 1 exists. The idea is that A' keeps track of 
the register contents of the last B positions, and of the last B data values read. 
To this aim, we set R' = {— B, . . . , —1} x (R W {current}). Register (— i, current) 
contains the i-th last input data value (wrt. the next position to read), and reg- 
ister (— i, r) simulates register assignments of A for r. In particular, this allows 
us to access every input data value from the last B < B positions. In order to 
anticipate data values, a state of A' contains, apart from a state of A, an equiv- 
alence relation over both the new set of registers R' and the next B positions. 
Thus, a state of A' is a pair (g, ~) where q 6 Q and ~ is an equivalence relation 
over R' x 

To simulate an update f(r) — (k,B) of A with B > 1, A' either writes the 
current value or one of the values stored in (— B, current), . . . , (—1, current) into 
r, or goes into a state in which r and at least one of the next B positions are 
considered equivalent. Of course, the equivalence has to be globally consistent 
and locally consistent meaning that two equivalent registers should contain the 
same data value. Moreover, when A' is in a state where the next position and a 
defined register r are considered equivalent, then the next symbol to read is the 
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contents of r. If, in contrast, the next position is not equivalent to some denned 
register, then A' should read a data value that is currently not stored, and store 
it in r (unless another update for r applies). This finally ensures that a suitable 
data value in terms of an equivalence relation has been guessed when performing 
an update of the form f(r) — (k, B). □ 

The remaining (strict) inclusions are left open. When there are no data values, 
we have expressive equivalence of EMSO logic and class register automata (which 
then reduce to class memory automata) . The translation from automata to logic 
follows the standard approach. The following theorem is a proper generalization 
of the main result of [S]. 

Theorem 5. Suppose m = 0. For every signature §, EMSO(S) = CRA(S). 
7 Infinite Data Words 

In the realm of reactive systems, it is appropriate to consider infinite data words, 
i.e., sequences from the set (S x 2) m )". Note that all the notions that we intro- 
duced in Section [5] carry over to the new domain. In particular, a formula from 
rMSO(S) is interpreted over an infinite word w without modifying the definition. 
However, its fragment rEMSO(S) now appears limited. In terms of S^ yn , one can- 
not express "some process sends infinitely many messages during an execution", 
as can be shown using Hanf's Theorem. We therefore introduce a first-order 
quantifier 3°°. Formula 3°°x(p is satisfied by w — w\W2 ■ ■ ■ € (Ex J) m ) w if there 
are infinitely many positions i > 1 such that ip is satisfied when x is interpreted 
as i. We obtain the logics rFO°°(§) and rEMSO°°(§) as well as the language class 
rEMSO°°(§). Now, a translation from logic into automata requires an extension 
of class register automata. We define an uj- class register automaton (over §) to 
be a tuple A = (Q, R, A, (F < ) <eS ,<P) where Q 7 R,A,(F < ) <jeg are as in class 
register automata, and ^ is henceforth a boolean formula over { l q — oo' | q G 
Q} U { '<? < AT' | q £ Q and TV e IN}. Infnite runs (qi, pi)(q-z, pv) ■ ■ ■ and satis- 
faction of the new global acceptance condition are defined as one would expect. 
In particular, atom q = oo is satisfied if \{i > 1 | % = q}\ = oo. The class 
of languages recognized by w-class register automata is denoted by w-CIRA(S). 
Theorems [1] and [5] extend to infinite words. 

Theorem 6. For all §, we have rEMSO°°(§) C w-CRA(S). The size of the 
automaton is elementary in the size of the formula and |S|. If m = 0, then 
rEMSO°°(S) = w-CKA(S). 

Proof. The crucial observation is that PropositionQ]still holds. We actually take 
the same automaton Ab and run it on infinite words. The argument that makes 
the construction work relies on the fact that the past of any word position 
is finite. Moreover, it was shown in [7] that Theorem [5] has a counterpart for 
formulas with infinity quantifier. The proof is based on Vinner's extension of 
Ehrenfcucht-Frai'sse games [26]. Thus, for (p 6 rFO°°(§), there are BgB and a 
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boolean formula /3 over {'5 = oo' , l S < N' \ S € B-Spheres s and A^ G IN} such 
that L(ip) is the set of data words that satisfy j3. With this, the constructions 
from Section [S] can be adapted to translate an rEMSO 00 (S)-sentence into an 
w-class register automaton over S. □ 

We remark that the proof of Theorem [5] is not effective. Unlike the proof of 
Theorem [TJ it does not rely on [8l[T6] . We do not know if there is an effective 
alternative. 

8 Conclusion 

We studied the realizability problem for data-word languages. A particular case 
of this general framework constitutes a first step towards a logically motivated 
automata theory for dynamic message-passing systems. In light of this, it would 
be desirable to synthesize smaller and deadlock-free automata from logical or 
algebraic specifications. A good starting point for those studies may be temporal 

logic nung. 

Our approach to modeling systems over infinite alphabets may also lead to 
meaningful model-checking questions. It would be interesting to extend |20J, 
whose logic corresponds to ours in the case of §j yn , to general data words. 
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A. Correctness of sphere automaton 

We will show that the class register automaton Ab = (Q, R, A {F < ) <jl - s ,<P) over 
S and the mapping 7r : Q — > B-Spheresg are correct in the sense of Proposition [TJ 
L{Ab) = (S X £) m )* and, for every data word w — w\...w n (where w, = 
{a,i,di)), every accepting run (qi, pi) . . . (q n , p n ) of Ab on w, and every position 
i G [n], ir(qi) = B-Sph w (i). 

Every data word is accepted. Let us first show L{Ab) = {S x £> m )*, i.e., 
that every data word is accepted by Ab- Let w = (oi,di) . . . (a n ,d n ) G (S x 
J) m )* be any data word and let G(w) — ([n], (<l lu )< e s, A, v) be its associated 
graph. We have to show w G L(Ab)- A key issue is the assignment of colors to 
word positions in w such that overlapping spheres can be verified simultaneously. 
Let i, i' G [n]. We say that i and i' have a B -overlap in w if both B-Sph w (i) = 
B-Sph w (i') and dist w (i,i') <2B + 1. 

Lemma 3. There is a mapping <P : [n] — > {1, . . . , (2|S| + 1) • maxSize 2 + 1} such 
that <P(i) ^(i') whenever i and i' are distinct and have a B-overlap. 

Proof. We obtain $ as a coloring of the undirected graph ([n], Arcs) where two 
nodes are connected iff they are distinct and have a B-overlap. The graph has 
degree at most (2|S| + 1) • maxSize 2 so that it can be ((2|S| + 1) • maxSize 2 + 1)- 
colored by some mapping ^, i.e., @(i) ^ ${1') for every edge {i, i'}. □ 

We now define a sequence £ = {q\,pi) . . . (q n , p n ) of configurations of Ab and 
show that £ is an accepting run of Ab on w. Let i G [n]. We set 

q t = { (B-Sph w (i c ),i, 0(i c )) | i c G [n] such that dist w (i c , i) <B}. 

Suppose E = (S,a,col), S = (U, (< B )<es, A, v, 7), and k G [m]. We define 
Pi((E,k)) as follows. If there are positions i c ,i' G [n] such that dist w (i c ,i) < 

B, dtst w (i c ,i') < B. (S,a) = (B-Sph w (i c ),i'), and coZ = <P(i c ), then we set 
Pi((E, k)) = d k (i'). Otherwise, we let pi((E, k)) be undefined. Note that Pi((E, k)) 
is well defined, as there is at most one pair i c , i' satisfying the above properties. 

We check that qi is a state. Let E = (S, a, col) G and E' = (£", a', col') G qi 
with S = ((7,(< £ ) <eS ,A,^7) and 5' = {U\ (<\ E ') <e s, A', !/', 7'). 

(i) Assume 7 = a and 7' = a'. Then, (5,7) S {B-Sph w (i),i) and (5', 7') 
(B-Sph w (i),i). Thus, (5,7) S (S",7')- Moreover, coZ = coZ' = <2>(i). 

(ii) Clearly, we have A(a) = A'(a') and v{a) = v'(a'). 

(iii) Suppose S = S' (S = S', for simplicity) and col = col'. According to the 
definition of qi, there are positions i\,i 2 of w such that dist w {i,i\) < B, 
dist w (i,i 2 ) < B, (S,a) S (B-Sph w (h),i), (S,a') S (B-Sph w (i 2 ),i), and 
coZ = = *(i a ). We have (B-Sph w (ii),i) = (B-Sph w (i 2 ),i). As fa and 
«2 have a B-overlap, we also have, by Lemma |5J i\ — i 2 . We deduce a — a! . 
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Next, we define a tuple U = (pi,gi) — ^ (qi, fi) for all t 6 [n]. We let (p,)< = 
9prev™(i) (which might be undefined). Moreover, let gi and fi be uniquely given by 
conditions T7 and T8 where we replace q with qi . Before we check that conditions 
(l)-(4) of a run are satisfied, we verify that U is indeed a transition. In the 
following, we let E always refer to E = (S, a, col) with S — (U, (< £ ')<gs, A, v, 7). 

Tl Obviously, we have labelfa) = a,. 

T2 Let < G §\dom(pi) (which implies that prev^j(i) is undefined) and E G (ft. We 
have (S*, a) = (B-Sph w (i c ),i) for some i c with dist w (i c ,i) < B. As prev^(i) 
is undefined, we conclude that prev^(a) is undefined, too. 

T3 Let < G dom(pj), E G q^, j £ U ', and i<] = prev™(i). 

Suppose j <l E a. We need to show E[j] G g^. As E G ft, there is i c G [ra] 
such that dist w (i c ,i) < -B, (5*, a) = (B-Sph u '(i c ),i), and coZ = ^(i c )- Since 
dist E (-j , j) < B implies dist w (i c , i<j) < _B, and since (S*, j) = (B-Sph w (i c ), i^) 
and coZ = <£(i c ), we deduce = (5, j, coZ) G 

Conversely, suppose _S [j] G % < . We shall show j <J E a. There are positions 
ic,i' c G [ n ] such that we have dist w (i c ,i) < B, dist w (i' c ,i<i) < B, (S,a) = 
{B-Sph w (i c ),i), (S,j) * (B-5p/i ro (^),i<i), and coZ = <2>(i c ) = Note 
that i c and i^. have a B-overlap. By Lemma [3l i c = i^.. As, then, (S,j) = 
(B-Sph w (i' c ),i<), (S,a) = (B-Sph w (i' c ),i), and we can deduce jo^ a. 

T4 is shown similarly to T3. 

T5 Let < G dom(pi) and E £ qi such that prev^(a) is undefined. There is i c G 
[n] such that dist w (i c ,i) < B and (S, a) = (B-Sph w (i c ),i). Now, suppose 
dist E (~/,a) < B. But then, we also have dist w (i c ,i) < B and prev^(a) is 
defined, a contradiction. We deduce that dist E (^,a) = B. 

T6 is shown similarly to T5. 

T7 and T8 are immediate. 

So far, we know that ti is a transition. Now, let us check the run conditions. 
(1) and (2) are readily verified. 

(3) Consider guard = g± A 92 A (73. We first check subformula g%. For ki, ki G 
[m] , by the definition of and G(w), ki ~ qi &2 iff d^ 1 = d^ 2 . Now, consider 
<72 and an atomic subformula k = (<, (£", fc)) where /c G [m], E £ q, and < G 
type~(a). Set i<| = prev™(i), which must indeed exist (by T2). As E G 
there is i c G [n] such that dist w (i c ,i) < B, (S, a) = {B-Sph w (i c ),i), and 
col = $(i c )- This implies dist w (i c ,i<) < -B, and we obtain p i<s ((E, k)) = d\ 
so that ^2 also holds. Finally, we have to check g$. Consider its subformula 
(<l, (E\j], k)) = (< 2 , (E\j], k)) where fc G [m], £ G ft, j G U, and < x , < 2 € 
type~(a). Let ii = prev^j (i) and ?2 = P r ev^ 2 (i) (they both exist). Moreover, 
let jx = prev E (a) and ji = P rev <j 2 ( a )- As i? G qi, there is i c G [n] such 
that dist w (i c ,i) < B, (S,a) = (B-Sph w (i c ),i), and col = ^(« c )- Due to 
the isomorphism, there is a unique i' G [n] such that dist w (i c ,i') < i? and 
S (B-Sph w {i c ) i i l ). Moreover, we have = (B-SpK* '(*c),*i) and 

(5 1 , j'2) = {B-Sph w (i c ),i2)- In particular, dist w (i c ,ii) < Band dist w (i c ,i2) < 
B. We deduce p.^ ((-E[j], fc)) = pi 2 ((£[j], fc)) = dj. Thus, 53 is satisfied. 
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(4) Let (E, k) G R. We distinguish three cases. 

• If there is j G U such that E[j] G ft, and type~(j) ^ 0, then we have 
fi((E,k)) = (<,(E,k)) with < = <E[j]- Since E[j] G ft, there is a 
position z c G [n] such that dist w (i c ,i) < B, (S,j) = (B-Sph w (i c ),i), 
and co/ = <£(i c ). Moreover, there is a unique position %' G [n] such that 
dist w (i c ,i') < B and (5, a) S (B-Sph w {i c ), ij. As j < = prevf (j) is de- 
fined, i < = prev^(z) is defined, too. Note that (S, j<) = (B-Sph w (i c ),i<\) 
and dist w (i c , z<) < B. We obtain p;((£, fc)) = 4 = p^ ((£, fc)). 

• If there is j G U such that 2?[j] G ft and type~(j) = 0, then fi((E, k)) = 
(k,dist E (a,j)). We show Pi ((E,k)) G where S' = dist E (a,j). 
As £7[j] G ft, there is i c G [n] such that dist w (i c ,i) < B, (S,j) = 
(B-Sph w (i c ),i), and col = $(i c ). Thus, there is a unique position i' G 
[n] such that dist w {i c ,i') < B and (S,a) = (B-Sph w {i c ),i'). We have 
dist w (i',i) < dist E (a,j), and we can deduce pi((E,k)) — d\, G D%,(i). 

• If there is no j G U such that G ft, then fi((E,k)) is undefined. 
Therefore, pi((E,k)) should be undefined, too. Suppose, towards a con- 
tradiction, that pi((E,k)) G T). Then, there are i c ,i' G [n] such that 
we have dist w (i c ,i) < B, dist w \i c ,i') < B, (S,a) S {B-Sph w {i c ),i'), 
and coZ = ^(i c ), But then, there is a unique j G U such that (S,j) = 
(B-Sph w (i c ),i) so that E[j] G ft, which is a contradiction. 

We conclude that £ is a run. Let us quickly verify that it is accepting. Trivially, 
<P = true is satisfied. Now suppose < G § and consider any position i G [n] such 
that next^(i) is undefined. We have to show that ft is contained in F < , i.e., 
next^(a) is undefined for all E G ft. So suppose E G ft. There is i c G [n] such 
that dist w (i c ,i) < B and (5, a) = (B-Sph w (i c ),i). As next™(i) is undefined, 
next^(a) must be undefined, too. 

Every run keeps track of spheres. In this part of the proof, we show that 
we can infer, from every accepting run of Ab on data word w, the spheres that 
occur in G(w). 

Let w = (oi, di) . . . (a n , d n ) G (S x be a data word and G(w) — 

([n], (< tu )<es, A, v) its graph. Suppose £ = (ft, pi) . . . (q n , p n ) is an accepting 
run of Ab on w with corresponding transitions t\,...,t n where U — (pi, ft) 
(«»,/»)• 

The following claim states that an arbitrarily long path of an extended sphere 
E that starts in its active node is faithfully simulated by w. It will turn out to 
be crucial that, hereby, the data values in registers of the form (E[j],k) are 
invariant during that simulation. 

Lemma 4. Let i G [n] be some position, e > 0, and E = (S, a, col) G ft with 
S = (U, (<J E )<ies, A, v, 7). Suppose there are jo, . . . , j e G U and <Ji . . . , <j e G S 
such that a = j and, for all z G {0, . . . , e — 1}, j z <f +1 j z+1 or j z+1 <f +1 j z . 
Then, there is a unique sequence i = i , . . . ,i e G [n] such that the following hold: 
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— for each z € {0, . . . , e- 1}, j z <f +1 j z+1 implies i z <3™ +1 i z+1 andj z+1 <f +1 
j z implies i z+ i <™ +1 i z 

— /or eac/i z 6 {0, . . . , e}, we have E[j z ] G q iz , \{j z ) = a i% , and v(j z ) = i>(i z ) 

— for each z G {l,...,e}, fc G [to], and j G {/, we /iaue pi ((-E[j], k)) = 

— for each z G {0, . . . , e} and k G [to], we have that pi z ((E[j z ], k)) = d\ 

Proof. We proceed by induction on e. Suppose e = 0. By Tl and guard g\ 
of T7, X(a) = ai and v(a) — Let k G [to] and suppose type~(a) ^ 

0. Then, fi{{E,k)) = (<,(E, k)) where we let <] = < E . Thus, Pi {{E,k)) = 
Pprev™ (*)((#,&))• By guard g 2 of T7, we have pi((E,k)) = df. If type~(a) = 0, 
then fc)) = d\ is due to the update fi((E, k)) = (k, 0) (T8). 

So let e > 0, jo, • • ■ ,je,je+i G U, and <i, . . . , < e , < e +i G S such that a = jo 
and, for every z G {0, . . . , e}, j z <f +1 j 2+ i or j z+ i <f +1 j z . Let i , . . . , i e G [n] 
be the unique corresponding sequence with the required properties. We consider 
two cases: 

— Assume j e <f + i j e +i- Then, qi c £ F <e+1 so that next^ c+1 (z e ) is defined. We 
set i e+ i = next™ c+i (i e ). 

Due to T4, we have -E[j e +i] G Qi c+1 - By Tl and guard g\ of T7, we obtain 

A(je+l) = flie+D and Kje+l) = i>(«e+l)- 

Let /c G [to] and j G f. Due to condition T8, E[j e+ i] G qi c+1 implies that 
fu +1 ((E[j],k)) — (<\,(E\j],k)) for some < G §. Due to guard g 3 of condi- 
tion T7, we have p pr ev m (i c+1 )((E[j},k)) = p ic ((E[j], k)). We can now deduce 

Finally, let k G [to]. We have f ie+1 ((E[j e+1 ], k)) — (<], (E[j e+1 ], k)) where we 
let < = < E \j B+1 \- Thus, p ie+1 ((E\j e+1 ],k)) = p prev ™ {tc+l) ((E[j e+1 ],k)). By 
guard g 2 of T7, we obtain p ie+1 ((E[j e+1 ], k)) = d$ e+1 . 

— Assume j e +i je- By T2, < e +i G dom(pi). Thus, there is (a unique) i e +\ 
such that i e +i <™ +1 i e . 

By T3, we have E[j e+1 ] G q ie+1 . Moreover, X(j e +i) = (H e+1) and f(j e +i) = 

£(«e+l)- 

Let k G [to] and j G J7. By condition T8, we have E[j e ] G qi e implies 
fi c ((E[j], k)) — (<, (25 [7], fe)) for some < G §. Due to guard g^ of condition 
T7, we have p prev ^)((E[j}, k)) = p ic+1 ((E[j},k)). We deduce p lc ((E[j},k)) = 
p lc+1 ((E[j],k)). 

Finally, let k G [to]. We distinguish two cases. Suppose type (j e +i) ^ 0- 
Then, f lr+1 ((E[j e+1 ], k)) = (<\,{E[j e+l },k)) where we let < = < E \j e+1 y 
Thus, p ie+1 ((E\j e+1 ],k)) = pprw™(i e+1 ){{E\je+i],k)). By guard g 2 of T7, we 
ha,vep ie+1 ((E[j e+1 ],k)) = d£ +1 . If type' (J e+l ) = 0, then p ie+1 ((E[j e+1 ], k)) = 
d$ is due to the update f ie+1 ((E\j e+1 \,k)) = (fc,0) (T8). 

This concludes the proof of Lemma @] □ 

By means of Lemma [H we will show that spheres that are contained in states 
indeed occur in a data word. It will be used in combination with the following 
simple monotonicity fact, which follows easily from the definitions. 
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Next, we show that a sphere correctly simulates w and vice versa, which 
concludes the correctness proof for Ab ■ 

For i £ [n], let Ei = (Si,ai,cok) with Si := (Ui, (< E *) <e §, Ai, fi, ji) be the 
unique extended sphere from qi such that 7,; = aj. In particular, Si = n(qi). 

Lemma 5. For all i £ [n], we have B-Sph w (i) = Si. 

Proof. For e £ {0, . . . , £?}, let e-Si denote the e-sphere of (U i} (< B *)<ie§) 
around 7$, which is defined in the canonical manner. We show, by induction, the 
following more general statement: 

For every e G {0, . . . , £?}, there is an isomorphism h : e-Sph w (i) — > e-Si 
such that, for each i' £ [n] with dist w (i,i') < e, we have Ei[h(i')] £ . 

We easily verify that (*) holds for e = 0. Now suppose there is an isomorphism 
h : e-Sph w (i) — > e-Si with e < B. We extend the domain of h to elements i' 
with dist w (i, i') = e + 1 as follows. Let i\,i2 £ [n] such that dist w (i, i\) — e and 
dist w {i : i2) = e + 1. Let < 6 S. We distinguish several cases: 

— Suppose ii <i w %2- Since dist w < B, we have dist w (ji, h(ii)) < B. By 
T6, there is j'2 £ Ui such that h(i\) <\ Ei ji- Since Ei[h(ii)] £ q^, we obtain, 
by Tl, T4, and T7, Ai(j 2 ) = a i2 , Vi(j 2 ) = v{i-i), and Ei[j 2 ] £ qi 2 . 

— Suppose 12 <\ w i\. Similarly, due to dist w < B and T5, there is 22 £ 
Ui such that j 2 < Ei h(i\). Using Tl, T3, and T7, we obtain Xi(j 2 ) = Q>i 2 , 
Vi{h) — vfa), and Ei[j 2 ] £ qi 2 . 

We set h(i 2 ) = j 2 and h(i') = h(i') for all positions i' in e-Sph w (i). In doing so, 
we extend the domain of h to elements with distance e + 1 from i. Note that 
this extension h : (e+ 1)-Sph w (i) — > (e + l)-<Si is well defined, i.e., j'2 is uniquely 
determined by i 2 and does not depend on the choice of i\ or <J: if, for i 2 , we 
obtained distinct elements j 2 and j 2 , then £^[72] G 9i 2 an d ^[j^] £ qi 2 , which 
contradicts the definition of a state. 

We show that we obtain a homomorphism h : (e + 1)-Sph w (i) — ► (e + l)-5j. 
Let ii, i 2 G [n] such that dist w (i, i\) — dist w (i, i 2 ) = e + 1. Moreover, let < G S. 
Suppose ii 12 (the case i 2 i\ is symmetric). We have Ei[h(ii)] £ q^ and 
£#(^2)] 6 fe- By T3 (or T4), this implies h(i 2 ). 

Next, we show that h is surjective. Let ji,j2 G and < G S such that 
dist Ei (~/i,ji) — e, dist Ei (^i,ji) = e + 1, and ji < Bi j 2 (the case j 2 < B * ii is 
similar). We have Ei\ji] £ qh-^ij^)- By T4 and Qh- 1 ^) ^ £<, there is 12 £ [n] 
such that dist w {i,i2) = e + 1, h~ 1 (ji) <J W 12, and £7* [72] G <7i 2 . We deduce that ft, 
is surjective. 

Let us show that h is injective. Let ii,«2 G [n] such that dist w = 
dist w (i,i2) = e + 1. Assume ^ i 2 . We show that, then, ft(ii) 7^ h(i 2 ). Let 
j'x = ft(«i) and j 2 = h{i2). Assume, towards a contradiction, that ji = j 2 - 
Furthermore, assume i\ < 12 (the other case is symmetric). In Ei, there are 
paths from j% to a and from a to j% that are simulated, in w, by paths from i 2 
to i and from i to ii, respectively. By Lemma|4]and monotonicity of a signature, 
we can simulate these paths of Ei arbitrarily often in w. This yields an infinite 
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Fig. 7. h is injective Fig. 8. h 1 is a homomorphism 



descending chain . . . < i\ < i\ < i\ < i 2 such that E[ji] G and d^ 2 = d^ = d^ 
for all / > 1 and k G [m] . But this is a contradiction, as every word position has 
only finitely many smaller positions. The procedure is illustrated in Figure [7J 

Finally, we show that h : (e+ l)-Sph w (i) — > (e + l)-iSj is actually an isomor- 
phism. Let G Ui and <a G S such that dist Ei (7, ji) — dist E * (7, j'2) = e + 1 
and ji < Ei j2- We show that this implies O 1 " h~ 1 (j 2 ). Set ii = 
and %2 = h^ 1 ^). Assume, towards a contradiction, that i\ <fi w i 2 . We have 
ji 7^ J2, -Eib'i] ^ 3tu an d Ei[j2\ G <Zi 2 - Due to the definition of the set of states 
of Ab-, this implies i\ ^ Suppose next^(ii) < i 2 (the other case is similar). 
Again, by Lemma |J] and monotonicity of <f t ", we can build an infinite descending 
chain . . . < if < i\ < i\ < i 2 such that E\j\] G for all I > 1 (cf. Figure [3]). 
This is a contradiction. □ 

B. Comparison with Class Automata 

We compare class register automata to class automata [3J, which have been 
shown to capture all (extended) XPath queries. Class automata are a smooth 
(undecidable) extension of data automata and, therefore, of class memory au- 
tomata. A class automaton is suitable to work over words (even trees) with multi- 
ple data values. It consists in a pair (A, B) where A is a non-deterministic letter- 
to-letter transducer from the label alphabet E to some working alphabet r, 
and B is a finite automaton over _T x {0, 1}™. A data word (ai, d%) . . . (a„, d n ) G 
S x {0, l} m is accepted if, for input a\ . . . a n , there is some output u\ . . . u n G -T* 
of A such that, for all d G ID, the word (tti, &i) . . . (u n , b n ) G (r x {0, 1}™)* is 
accepted by B. Hereby, b\ = 1 iff = d. 
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We will show that, for m — 2, class automata capture neither EMSO logic nor 
non-guessing class register automata. Note that class automata do not depend 
on a signature. To allow for a fair comparison, we choose the simple signature 

= {^+1 ' ^~ > 

Theorem 7. There is L e rEMSO(§^ 1 ^) n CMA"(S^ 1 ^) such that L cannot 
be recognized by any class automaton. 

Proof. Let E = {a} and D = K. Using [3], one can show that there is no class 
automaton that recognizes L = [{(a, 1, 1) . . . (a, n, n)(a, 1, 1) . . . (a, n, n) \ n > 
ljjs^ . It is, however, easy to define an rEMSO(§^_ 1 ^-sentence for L. We 
restrict to the construction of a non-guessing class register automaton, which is 
very similar to the automaton from Example^ Here, we will need four registers, 
r\ and r\ for k — 1,2. The crucial difference is in the second phase, where 
we encounter a data value for the second time. We henceforth require that, at 
position n + i, the fc-th data value d\ +i is contained in register r\ at prev^ fc (n + 
i) = i. The value d^ +i is henceforth stored in r\ and has to coincide, at position 
n + i + 1, with the contents of r\ at position prev^ fc (n + i + 1) = i + 1. □ 
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